OcrV1, Main, Exploration, bibRecord, 000A85

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Identifieur interne : 000A85 ( Main/Exploration ); précédent : 000A84; suivant : 000A86

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Auteurs : Suryaprakash Kompalli [États-Unis] ; Srirangaraj Setlur [États-Unis] ; Venugopal Govindaraju [États-Unis]

Source :

International journal on document analysis and recognition : (Print) [ 1433-2833 ] ; 2009.

RBID : Pascal:10-0180818

Descripteurs français

Pascal (Inist)
- Reconnaissance caractère, Reconnaissance optique caractère, Concordance forme, Classification, Mot, Langage naturel, Automate stochastique, Automate fini, Machine état fini, Linguistique, Reconnaissance forme, Traitement image, Segmentation, Approche probabiliste, Modélisation, Méthode graphe, Théorie graphe, ., Appariement image, Modèle n gramme.
Wicri :
- topic : Classification, Linguistique.

English descriptors

KwdEn :
- Character recognition, Classification, Finite automaton, Finite state machine, Graph method, Graph theory, Image matching, Image processing, Linguistics, Modeling, N gram model, Natural language, Optical character recognition, Pattern matching, Pattern recognition, Probabilistic approach, Segmentation, Stochastic automaton, Word.

Abstract

This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.

Affiliations:

Links toward previous steps (curation, corpus...)

to stream PascalFrancis, to step Corpus: 000194
to stream PascalFrancis, to step Curation: 000583
to stream PascalFrancis, to step Checkpoint: 000201
to stream Main, to step Merge: 000A95
to stream Main, to step Curation: 000A85

Le document en format XML

<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en" level="a">Devanagari OCR using a recognition driven segmentation framework and stochastic language models</title>
<author><name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">INIST</idno>
<idno type="inist">10-0180818</idno>
<date when="2009">2009</date>
<idno type="stanalyst">PASCAL 10-0180818 INIST</idno>
<idno type="RBID">Pascal:10-0180818</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000194</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000583</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000201</idno>
<idno type="wicri:doubleKey">1433-2833:2009:Kompalli S:devanagari:ocr:using</idno>
<idno type="wicri:Area/Main/Merge">000A95</idno>
<idno type="wicri:Area/Main/Curation">000A85</idno>
<idno type="wicri:Area/Main/Exploration">000A85</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en" level="a">Devanagari OCR using a recognition driven segmentation framework and stochastic language models</title>
<author><name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
</affiliation>
</author>
<author><name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<affiliation wicri:level="4"><inist:fA14 i1="01"><s1>Department of Computer Science and Engineering, University at Buffalo, State University of New York</s1>
<s2>Buffalo</s2>
<s3>USA</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>États-Unis</country>
<wicri:noRegion>Buffalo</wicri:noRegion>
<orgName type="university">Université d'État de New York à Buffalo</orgName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<placeName><settlement type="city">Buffalo (New York)</settlement>
<region type="state">État de New York</region>
</placeName>
<orgName type="university" n="3">Université d'État de New York à Buffalo</orgName>
<orgName type="institution">Université d'État de New York</orgName>
</affiliation>
</author>
</analytic>
<series><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
<imprint><date when="2009">2009</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt><title level="j" type="main">International journal on document analysis and recognition : (Print)</title>
<title level="j" type="abbreviated">Int. j. doc. anal. recognit. : (Print)</title>
<idno type="ISSN">1433-2833</idno>
</seriesStmt>
</fileDesc>
<profileDesc><textClass><keywords scheme="KwdEn" xml:lang="en"><term>Character recognition</term>
<term>Classification</term>
<term>Finite automaton</term>
<term>Finite state machine</term>
<term>Graph method</term>
<term>Graph theory</term>
<term>Image matching</term>
<term>Image processing</term>
<term>Linguistics</term>
<term>Modeling</term>
<term>N gram model</term>
<term>Natural language</term>
<term>Optical character recognition</term>
<term>Pattern matching</term>
<term>Pattern recognition</term>
<term>Probabilistic approach</term>
<term>Segmentation</term>
<term>Stochastic automaton</term>
<term>Word</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr"><term>Reconnaissance caractère</term>
<term>Reconnaissance optique caractère</term>
<term>Concordance forme</term>
<term>Classification</term>
<term>Mot</term>
<term>Langage naturel</term>
<term>Automate stochastique</term>
<term>Automate fini</term>
<term>Machine état fini</term>
<term>Linguistique</term>
<term>Reconnaissance forme</term>
<term>Traitement image</term>
<term>Segmentation</term>
<term>Approche probabiliste</term>
<term>Modélisation</term>
<term>Méthode graphe</term>
<term>Théorie graphe</term>
<term>.</term>
<term>Appariement image</term>
<term>Modèle n gramme</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr"><term>Classification</term>
<term>Linguistique</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">This paper describes a novel recognition driven segmentation methodology for Devanagari Optical Character Recognition. Prior approaches have used sequential rules to segment characters followed by template matching for classification. Our method uses a graph representation to segment characters. This method allows us to segment horizontally or vertically overlapping characters as well as those connected along non-linear boundaries into finer primitive components. The components are then processed by a classifier and the classifier score is used to determine if the components need to be further segmented. Multiple hypotheses are obtained for each composite character by considering all possible combinations of the classifier results for the primitive components. Word recognition is performed by designing a stochastic finite state automaton (SFSA) that takes into account both classifier scores as well as character frequencies. A novel feature of our approach is that we use sub-character primitive components in the classification stage in order to reduce the number of classes whereas we use an n-gram language model based on the linguistic character units for word recognition.</div>
</front>
</TEI>
<affiliations><list><country><li>États-Unis</li>
</country>
<region><li>État de New York</li>
</region>
<settlement><li>Buffalo (New York)</li>
</settlement>
<orgName><li>Université d'État de New York</li>
<li>Université d'État de New York à Buffalo</li>
</orgName>
</list>
<tree><country name="États-Unis"><region name="État de New York"><name sortKey="Kompalli, Suryaprakash" sort="Kompalli, Suryaprakash" uniqKey="Kompalli S" first="Suryaprakash" last="Kompalli">Suryaprakash Kompalli</name>
</region>
<name sortKey="Govindaraju, Venu" sort="Govindaraju, Venu" uniqKey="Govindaraju V" first="Venu" last="Govindaraju">Venugopal Govindaraju</name>
<name sortKey="Setlur, Srirangaraj" sort="Setlur, Srirangaraj" uniqKey="Setlur S" first="Srirangaraj" last="Setlur">Srirangaraj Setlur</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration

HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000A85 | SxmlIndent | more

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000A85 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:10-0180818
   |texte=   Devanagari OCR using a recognition driven segmentation framework and stochastic language models
}}

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024

	Serveur d'exploration sur l'OCR
	Attention, ce site est en cours de développement ! Attention, site généré par des moyens informatiques à partir de corpus bruts. Les informations ne sont donc pas validées.

Serveur d'exploration sur l'OCR

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Devanagari OCR using a recognition driven segmentation framework and stochastic language models

Source :

Descripteurs français

English descriptors

Abstract

Links toward previous steps (curation, corpus...)

Le document en format XML

Pour manipuler ce document sous Unix (Dilib)

Pour mettre un lien sur cette page dans le réseau Wicri